This uses pointblank to create a data validation report. In the resulting table at the end, any failing tests should have a CSV button that lets you download a .csv file of just the rows of data that don’t pass that particular validation step.

Check for missing values: Height

Action levels

For most tests, the following criteria are used:

The exceptions are, —-, —-, —–. These use the strict condition, which will return an ‘error’ if any rows fail.

al_default <-  action_levels(warn_at = 1, stop_at = 0.02) #warn if even row fails, error if 2% of rows fail
al_strict <- action_levels(stop_at = 1) #error if even one row fails

Data Validation

These are evaluations of (1) data type, (2) the range of values for shoots, height, and infloresences, and (3) potential duplicates in HDP_plots.csv and HDP_1997_2009.csv.

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_in_set
 col_vals_in_set()

subplot

A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, E1, E2, E3, E4, E5, E6, E7, E8, E9, E10, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, I1, I2, I3, I4, I5, I6, I7, I8, I9, I10, J1, J2, J3, J4, J5, J6, J7, J8, J9, J10

67K 67K
1.00
0
0.00

2
col_vals_in_set
 col_vals_in_set()

plot_id

CF-1, CF-2, CF-3, CF-4, CF-5, CF-6, FF-1, FF-2, FF-3, FF-4, FF-5, FF-6, FF-7

67K 67K
1.00
0
0.00

3
col_vals_expr

Height is measured to nearest cm

col_vals_expr()

ht%%1 == 0

57K 57K
1.00
0
0.00

4
col_vals_expr

Shoots is interger

col_vals_expr()

shts%%1 == 0

57K 57K
1.00
0
0.00

5
col_vals_expr

Number of inflorescences is integer

col_vals_expr()

infl%%1 == 0

2K 2K
1.00
0
0.00

6
col_vals_between

shoots between 0 and 20

col_vals_between()

shts

[0, 20]

67K 67K
0.99
8
0.01

7
col_vals_between

height between 0 and 200cm

col_vals_between()

ht

[0, 200]

67K 67K
0.99
2
0.01

8
col_vals_between

infloresences between 0 and 3

col_vals_between()

infl

[0, 3]

67K 67K
0.99
15
0.01

9
rows_distinct

duplicated rows

rows_distinct()

67K 67K
1.00
0
0.00

10
col_vals_not_null
 col_vals_not_null()

plant_id

67K 67K
1.00
0
0.00

11
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

3K 3K
1.00
0
0.00

12
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

4K 4K
1.00
0
0.00

13
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

14
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

15
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

16
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

17
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

18
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

19
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

7K 7K
1.00
0
0.00

20
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

21
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

22
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

2023-05-26 01:01:31 UTC 4.8 s 2023-05-26 01:01:36 UTC

Plant Size

This test is for unusually large changes in plant size from one year to the next.

Checks that year to year change in size is reasonable

Pointblank Validation
Check growth & regression

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

|% change in height| < 200%

col_vals_lt()

ht_pc

2

67K 66K
0.99
420
0.01

2
col_vals_between

|∆ height| < 100cm

col_vals_between()

ht_diff

[−100, 100]

67K 67K
0.99
11
0.01

3
col_vals_between

|∆ shoot number| < 5

col_vals_between()

shts_diff

[−5, 5]

67K 67K
0.99
201
0.01

2023-05-26 01:01:39 UTC < 1 s 2023-05-26 01:01:39 UTC

Seedling Size

This searches for seedlings that are unusually large.

Pointblank Validation
Check seedlings

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

shoots < 3

col_vals_lt()

shts

3

3K 3K
0.99
12
0.01

2
col_vals_lt

height < 30cm

col_vals_lt()

ht

30

3K 3K
0.99
3
0.01

2023-05-26 01:01:40 UTC < 1 s 2023-05-26 01:01:41 UTC